# CLIP Architecture

Vit Large Patch14 Clip 224.laion2b
Apache-2.0
Vision Transformer model based on CLIP architecture, specialized in image feature extraction
Image Classification Transformers
V
timm
502
0
Vit Large Patch14 Clip 224.datacompxl
Apache-2.0
A vision Transformer model based on the CLIP architecture, specifically designed for image feature extraction, released by the LAION organization.
Image Classification Transformers
V
timm
14
0
Vit Base Patch16 Clip 224.laion2b
Apache-2.0
Vision Transformer model based on CLIP architecture, containing only the image encoder part, suitable for image feature extraction tasks
Image Classification Transformers
V
timm
4,460
0
Vit Base Patch16 Plus Clip 240.laion400m E31
MIT
A vision-language dual-purpose model trained on the LAION-400M dataset, supporting zero-shot image classification tasks
Image Classification
V
timm
37.23k
0
Resnet50x4 Clip.openai
MIT
ResNet50x4 vision-language model based on CLIP architecture, supporting zero-shot image classification tasks
Image-to-Text
R
timm
2,303
0
Chinese Clip Vit Base Patch16
Chinese CLIP model based on ViT architecture, supporting multimodal understanding of images and text
Text-to-Image Transformers
C
Xenova
264
1
CLIP ViT B 16 CommonPool.L.clip S1b B8k
MIT
A vision-language model based on the CLIP architecture, supporting zero-shot image classification tasks
Text-to-Image
C
laion
138
0
CLIP ViT B 32 DataComp.M S128m B4k
MIT
A vision-language model based on the CLIP architecture, supporting zero-shot image classification tasks, trained on the DataComp.M dataset
Text-to-Image
C
laion
212
0
CLIP ViT B 32 CommonPool.M.laion S128m B4k
MIT
A vision-language model based on the CLIP architecture, supporting zero-shot image classification tasks
Text-to-Image
C
laion
65
0
CLIP ViT B 32 CommonPool.S S13m B4k
MIT
A vision-language model based on the CLIP architecture, supporting zero-shot image classification tasks
Text-to-Image
C
laion
79
0
Eva02 Base Patch16 Clip 224.merged2b S8b B131k
MIT
CLIP model based on EVA02 architecture, suitable for zero-shot image classification tasks
Text-to-Image
E
timm
29.73k
0
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
© 2025AIbase